14,444 research outputs found
FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test
The maximum mean discrepancy (MMD) is a recently proposed test statistic for
two-sample test. Its quadratic time complexity, however, greatly hampers its
availability to large-scale applications. To accelerate the MMD calculation, in
this study we propose an efficient method called FastMMD. The core idea of
FastMMD is to equivalently transform the MMD with shift-invariant kernels into
the amplitude expectation of a linear combination of sinusoid components based
on Bochner's theorem and Fourier transform (Rahimi & Recht, 2007). Taking
advantage of sampling of Fourier transform, FastMMD decreases the time
complexity for MMD calculation from to , where and
are the size and dimension of the sample set, respectively. Here is the
number of basis functions for approximating kernels which determines the
approximation accuracy. For kernels that are spherically invariant, the
computation can be further accelerated to by using the Fastfood
technique (Le et al., 2013). The uniform convergence of our method has also
been theoretically proved in both unbiased and biased estimates. We have
further provided a geometric explanation for our method, namely ensemble of
circular discrepancy, which facilitates us to understand the insight of MMD,
and is hopeful to help arouse more extensive metrics for assessing two-sample
test. Experimental results substantiate that FastMMD is with similar accuracy
as exact MMD, while with faster computation speed and lower variance than the
existing MMD approximation methods
Phraseology in Corpus-based transaltion studies : stylistic study of two contempoarary Chinese translation of Cervantes's Don Quijote
The present work sets out to investigate the stylistic profiles of two modern Chinese versions of Cervantes???s Don Quijote (I): by Yang Jiang (1978), the first direct translation from Castilian to Chinese, and by Liu Jingsheng (1995), which is one of the most commercially successful versions of the Castilian literary classic. This thesis focuses on a detailed linguistic analysis carried out with the help of the latest textual analytical tools, natural language processing applications and statistical packages. The type of linguistic phenomenon singled out for study is four-character expressions (FCEXs), which are a very typical category of Chinese phraseology. The work opens with the creation of a descriptive framework for the annotation of linguistic data extracted from the parallel corpus of Don Quijote. Subsequently, the classified and extracted data are put through several statistical tests. The results of these tests prove to be very revealing regarding the different use of FCEXs in the two Chinese translations. The computational modelling of the linguistic data would seem to indicate that among other findings, while Liu???s use of archaic idioms has followed the general patterns of the original and also of Yang???s work in the first half of Don Quijote I, noticeable variations begin to emerge in the second half of Liu???s more recent version. Such an idiosyncratic use of archaisms by Liu, which may be defined as style shifting or style variation, is then analyzed in quantitative terms through the application of the proposed context-motivated theory (CMT). The results of applying the CMT-derived statistical models show that the detected stylistic variation may well point to the internal consistency of the translator in rendering the second half of Part I of the novel, which reflects his freer, more creative and experimental style of translation. Through the introduction and testing of quantitative research methods adapted from corpus linguistics and textual statistics, this thesis has made a major contribution to methodological innovation in the study of style within the context of corpus-based translation studies.Imperial Users onl
Phraseology in Corpus-Based Translation Studies: A Stylistic Study of Two Contemporary Chinese Translations of Cervantes's Don Quijote
The present work sets out to investigate the stylistic profiles of two modern Chinese versions of
Cervantes’s Don Quijote (I): by Yang Jiang (1978), the first direct translation from Castilian to Chinese,
and by Liu Jingsheng (1995), which is one of the most commercially successful versions of the
Castilian literary classic. This thesis focuses on a detailed linguistic analysis carried out with the help
of the latest textual analytical tools, natural language processing applications and statistical packages.
The type of linguistic phenomenon singled out for study is four-character expressions (FCEXs), which
are a very typical category of Chinese phraseology. The work opens with the creation of a descriptive
framework for the annotation of linguistic data extracted from the parallel corpus of Don Quijote.
Subsequently, the classified and extracted data are put through several statistical tests. The results of
these tests prove to be very revealing regarding the different use of FCEXs in the two Chinese
translations. The computational modelling of the linguistic data would seem to indicate that among
other findings, while Liu’s use of archaic idioms has followed the general patterns of the original and
also of Yang’s work in the first half of Don Quijote I, noticeable variations begin to emerge in the
second half of Liu’s more recent version. Such an idiosyncratic use of archaisms by Liu, which may be
defined as style shifting or style variation, is then analyzed in quantitative terms through the application
of the proposed context-motivated theory (CMT). The results of applying the CMT-derived statistical
models show that the detected stylistic variation may well point to the internal consistency of the
translator in rendering the second half of Part I of the novel, which reflects his freer, more creative and
experimental style of translation. Through the introduction and testing of quantitative research methods
adapted from corpus linguistics and textual statistics, this thesis has made a major contribution to
methodological innovation in the study of style within the context of corpus-based translation studies
Density estimation for grouped data with application to line transect sampling
Line transect sampling is a method used to estimate wildlife populations,
with the resulting data often grouped in intervals. Estimating the density from
grouped data can be challenging. In this paper we propose a kernel density
estimator of wildlife population density for such grouped data. Our method uses
a combined cross-validation and smoothed bootstrap approach to select the
optimal bandwidth for grouped data. Our simulation study shows that with the
smoothing parameter selected with this method, the estimated density from
grouped data matches the true density more closely than with other approaches.
Using smoothed bootstrap, we also construct bias-adjusted confidence intervals
for the value of the density at the boundary. We apply the proposed method to
two grouped data sets, one from a wooden stake study where the true density is
known, and the other from a survey of kangaroos in Australia.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS307 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …